difference model
Offline RLHF Methods Need More Accurate Supervision Signals
Wang, Shiqi, Zhang, Zhengze, Zhao, Rui, Tan, Fei, Nguyen, Cam Tu
With the rapid advances in Large Language Models (LLMs), aligning LLMs with human preferences become increasingly important. Although Reinforcement Learning with Human Feedback (RLHF) proves effective, it is complicated and highly resource-intensive. As such, offline RLHF has been introduced as an alternative solution, which directly optimizes LLMs with ranking losses on a fixed preference dataset. Current offline RLHF only captures the ``ordinal relationship'' between responses, overlooking the crucial aspect of ``how much'' one is preferred over the others. To address this issue, we propose a simple yet effective solution called \textbf{R}eward \textbf{D}ifference \textbf{O}ptimization, shorted as \textbf{RDO}. Specifically, we introduce {\it reward difference coefficients} to reweigh sample pairs in offline RLHF. We then develop a {\it difference model} involving rich interactions between a pair of responses for predicting these difference coefficients. Experiments with 7B LLMs on the HH and TL;DR datasets substantiate the effectiveness of our method in both automatic metrics and human evaluation, thereby highlighting its potential for aligning LLMs with human intent and values.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > Middle East > Jordan (0.04)
Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations
Jentzen, Arnulf, Riekert, Adrian, von Wurstemberger, Philippe
Deep learning approximation methods - usually consisting of deep artificial neural networks (ANN) trained through stochastic gradient descent (SGD) optimization methods - belong nowadays to the most heavily employed approximation methods in the digital world. The striking feature of deep learning methods is that in many situations numerical simulations suggest that the computational effort of such methods seem to grow only at most polynomially in the input dimension d N = {1, 2, 3,... } of the problem under consideration. In contrast, classical numerical methods usually suffer under the so-called curse of dimensionality (cf., e.g., Bellman [4], Novak & Wozniakowski [37, Chapter 1], and Novak & Wozniakowski [38, Chapter 9]) in the sense that the computational effort grows at least exponentially in the dimension. In the recent years, deep learning technologies have also been intensively used to attack problems from scientific computing such as the numerical solutions of partial differential equations (PDEs). In particular, deep learning approximation methods have been used to approximately solve high-dimensional nonlinear PDEs (see, e.g., [2,5,10,11,14,16,25,42] and the references mentioned therein) such as high-dimensional nonlinear pricing problems from financial engineering and Hamiltonian-Jacobi-Bellman equations from optimal control. In the context of such highdimensional nonlinear PDEs, the progress of deep learning approximation methods is obvious as there are - except of in some special cases (see, e.g., [19, 20, 36] and the references therein for Branching type methods and see, e.g., [11-13, 22] and the references therein for multilevel Picard methods) - essentially no alternative numerical approximation methods which are capable of solving such high-dimensional nonlinear PDEs. There is nowadays also a huge literature on deep learning approximation methods for lowdimensional PDEs (cf., e.g., [24, 41]).
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (2 more...)
Torus Graphs for Multivariate Phase Coupling Analysis
Klein, Natalie, Orellana, Josue, Brincat, Scott, Miller, Earl K., Kass, Robert E.
Angular measurements are often modeled as circular random variables, where there are natural circular analogues of moments, including correlation. Because a product of circles is a torus, a d-dimensional vector of circular random variables lies on a d-dimensional torus. For such vectors we present here a class of graphical models, which we call torus graphs, based on the full exponential family with pairwise interactions. The topological distinction between a torus and Euclidean space has several important consequences. Our development was motivated by the problem of identifying phase coupling among oscillatory signals recorded from multiple electrodes in the brain: oscillatory phases across electrodes might tend to advance or recede together, indicating coordination across brain areas. The data analyzed here consisted of 24 phase angles measured repeatedly across 840 experimental trials (replications) during a memory task, where the electrodes were in 4 distinct brain regions, all known to be active while memories are being stored or retrieved. In realistic numerical simulations, we found that a standard pairwise assessment, known as phase locking value, is unable to describe multivariate phase interactions, but that torus graphs can accurately identify conditional associations. Torus graphs generalize several more restrictive approaches that have appeared in various scientific literatures, and produced intuitive results in the data we analyzed. Torus graphs thus unify multivariate analysis of circular data and present fertile territory for future research.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data
Varma, Paroma, He, Bryan, Iter, Dan, Xu, Peng, Yu, Rose, De Sa, Christopher, Ré, Christopher
A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to label training data. Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set. In particular, they fail to model latent subsets in the training data in which the supervision sources perform differently than on average. We present Socratic learning, a paradigm that uses feedback from a corresponding discriminative model to automatically identify these subsets and augments the structure of the generative model accordingly. Experimentally, we show that without any ground truth labels, the augmented generative model reduces error by up to 56.06% for a relation extraction task compared to a state-of-the-art weak supervision technique that utilizes generative models.
- North America > United States > California (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)